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Tandem 3' untranslated regions [UTRs), produced by alternative polyadenylation (APA) in the terminal exon of a gene, 
could have critical roles in regulating gene networks. Here we profiled tandem poIy(A) events on a genome-wide scale 
during the embryonic development of zebrafish [Danio rerio) using a recently developed SAPAS method. We showed that 
43% of the expressed protein-coding genes have tandem 3' UTRs. The average 3' UTR length follows a V-shaped dynamic 
pattern during early embryogenesis, in which the 3' UTRs are first shortened at zygotic genome activation, and then 
quickly lengthened during gastrulation. Over 4000 genes are found to switch tandem APA sites, and the distinct func- 
tional roles of these genes are indicated by Gene Ontology analysis. Three families of c/5-eIements, including miR-430 seed, 
U-rich element, and canonical poIy(A) signal, are enriched in 3' UTR-shortened/ lengthened genes in a stage-specific 
manner, suggesting temporal regulation coordinated by APA and transacting factors. Our results highlight the regulatory 
role of tandem 3' UTR control in early embryogenesis and suggest that APA may represent a new epigenetic paradigm of 
physiological regulations. 



[Supplemental material is available for this article.] 

Embryonic development involves a series of complex but ordered 
cellular processes including cell proliferation, differentiation, and 
migration under robust and precise management by gene regula- 
tory networks (Gilbert 2003). As a major regulatory region termi- 
nating a transcribed mRNA, the 3 ' UTR plays important roles in the 
transcriptional (Veraldi et al. 2001; Brown and Gilmartin 2003; 
Lutz 2008), post-transcriptional (Zhou and King 2004; Kloosterman 
and Plasterk 2006; Plasterk 2006; Zhao and Srivastava 2007; Stefani 
and Slack 2008), and translational (Kuersten and Goodwin 2003; 
de Moor et al. 2005) regulation of genes within the regulatory 
networks, often through the interactions of ds-elements and trans- 
acting factors. The process of alternative polyadenylation (APA) 
in the terminal exon enables the transcription of multiple mRNA 
isoforms with different 3' UTRs from a single gene (called tandem 
3' UTRs) (Edwalds-Gilbert et al. 1997; Lutz 2008; Tian and Graber 
2011). Notably, shortening of tandem 3' UTRs has been found 
to associate with cell proliferation (Sandberg et al. 2008), while 
lengthening of tandem 3' UTRs has been observed during cell 
differentiation (Ji et al. 2009; Shepard et al. 2011). Based on in- 
tegrative analysis of EST, SAGE, and microarray data, tandem 
3' UTR lengthening has been observed during mouse embryo- 
genesis (Ji et al. 2009). However, the specific genes directly reg- 
ulated by APA during vertebrate development remain unknown. 

Several high-throughput sequencing methods have been re- 
cently developed based on the second-generation sequencing 
platforms to deeply sample APA sites in a genome-wide fashion 
(Mangone et al. 2010; Fu et al. 2011; Jan et al. 2011; Shepard et al. 
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2011). Two of these methods have been applied to Caenorhabditis 
elegans along development (Mangone et al. 2010; Jan et al. 2011), 
in which widespread APA events were demonstrated, and a trend of 
3' UTR shortening was observed from embryo to adult (Mangone 
et al. 2010). However, the functional consequences of 3' UTR dy- 
namics on development have not been characterized. We pre- 
viously proposed another high-throughput method, SAPAS (Fu 
et al. 201 1). In this study, we used SAPAS to profile tandem poly (A) 
events throughout the embryonic development of zebrafish (Danio 
rerio), a well-studied model organism for developmental research, 
to address the functional role of tandem 3' UTR control during 
animal development. 

Results 

A comprehensive inventory of poIy(A) sites and tandem 
3' UTR genes 

Eight time points across all major developmental stages were 
sampled with one lane of the Illumina Genome Analyzer 2 plat- 
form assigned for each time point (see Supplemental Fig. SI; 
Supplemental Table SI). Of the 59,294,885 reads obtained after 
mapping and filtering, nearly 90% were mapped to annotated 
3' UTRs or 1-kb downstream regions (Supplemental Fig. S2). We 
pooled the data across all stages and focused on the 108,290 
poly (A) sites with five or more normalized reads (Methods). Only 
12% of these poly (A) sites were mapped to Ensembl transcription 
termination sites (TTS) (Fig. 1A) and another 1% to polyA_DB2 
(Lee et al. 2007), with the remaining 87% being unreported pre- 
viously. To test the authenticity of the novel poly(A) sites, we 
performed 3 'RACE on 30 novel poly(A) sites (five sites for each 
of the location categories shown in Fig. 1A, excluding Ensembl 
TTS) and 28 (93%) were validated (Supplemental Table S2). We also 
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Figure 1 . Summary of poly(A) sites and tandem 3' UTR genes. (A) Genomic locations of poly(A) sites. 
Only poly(A) sites with at least five normalized reads are considered. (B) Position-specific distributions 
of the canonical poly(A) signal hexamer AAUAAA for poly(A) sites from various genomic locations. 
(C) Schematics of tandem 3' UTRs. (D) Examples of tandem-UTR genes with two, three, and seven 3' UTR 
isoforms. (f ) Number of genes with different numbers of tandem poly(A) sites and the comparison with 
Ensembl annotation. (F) Fraction of genes with the total frequency of minor 3' UTR isoforms exceeding 
given thresholds (0.1%, 1%, 2%, 5%, 10%, 15%, and 20%). Frequencies of minor isoforms are cal- 
culated as one minus the top isoform frequency for genes with 2+ poly(A) sites (blue bars), and as one 
minus the sum of the frequencies of the top two isoforms for genes with 3+ poly(A) sites (red bars). 



tandem poly (A) sites in our study (Fig. IE; 
with three examples presented in Fig. ID). 
Some genes with multiple 3' UTR isoforms 
display biased expression to a single iso- 
form (the top isoform) and rarely express 
the other isoforms. However, this is not 
generally the case (Fig. IF). By excluding 
the expression percentage of the top iso- 
form, 58% of the genes with two or more 
poly(A) sites express the remaining iso- 
forms as a minimum of 10% of the total 
expression. Even when excluding the top 
two isoforms [restricted to genes with three 
or more poly(A) sites], over half of the 
genes (53%) express the remaining iso- 
forms at an appreciable level of 5% of 
total expression. These results not only 
reveal the complexity of the tandem UTR 
landscape during development, but also 
impose great challenges on data-processing 
methods, because systematic bias could 
be introduced if only two 3 ' UTR isoforms 
are considered per gene at a time (Fu et al. 
2011). 

The dynamics of tandem 3' UTRs 
during development 



compared our data with another RNA-seq data set (Aanes et al. 
2011), and found that most (>70%) intergenic poly(A) sites were 
located within 5 kb downstream from RNA-seq reads (Supplemental 
Table S3), indicating the authenticity of these novel poly(A) sites. 

The conventional mechanism of polyadenylation relies on 
two ds-elements in the neighborhood of a poly(A) site, i.e., the 
upstream poly(A) signal (PAS) hexamer AAUAAA/AUUAAA bound 
by CPSF and the downstream GU/U-rich element bound by CstF 
(Proudfoot 2011). These two signals are strongest for Ensembl- 
associated poly(A) sites, and are conspicuous for poly(A) sites from 
all of the other genomic regions except CDS (Fig. IB; Supplemental 
Fig. S3). The CDS-associated poly (A) sites show very distinct fea- 
tures: almost no upstream canonical PAS AAUAAA/AUUAAA could 
be found (Supplemental Fig. S4) and there is relatively low 
U-content and higher preference for C/G-content surrounding the 
poly (A) site (Supplemental Fig. S3); a similar phenomenon is also 
observed in plants (Wu et al. 2011). These results suggest a poten- 
tial novel mechanism of polyadenylation, which is independent 
of PAS and downstream GU/U-rich elements, in coding regions of 
genes of eukaryotes. 

Based on this greatly expanded set of poly(A) sites, we defined 
tandem 3' UTRs as previously (Fu et al. 2011) and focused on the 
11,505 genes with at least one SAPAS-sampled poly(A) site in the 
last exon. Up to 6660 (58% of the 11,505 genes, or 43% of all 
the expressed protein-coding genes) genes contained two or more 



We defined the 3' UTR length for a gene 
at a single time point as the average 
length of tandem 3' UTRs weighted by 
3' UTR isoform expression levels (isoform- 
weighted 3' UTR length; Supplemental 
Data). Figure 2A shows that the average 
3' UTR length varies dramatically dur- 
ing early development: first, the 3' UTR 
shortens from 0 hpf (zygote period) to 
4 hpf (blastula period), a time point immediately following zygotic 
genome activation (ZGA; at —2.75 hpf) (Kimmel et al. 1995), then 
significantly lengthens throughout the blastula and gastrula pe- 
riods, and continues to lengthen for the duration of embryonic 
development. The dramatic early changes in 3' UTR length are 
accompanied by high 3' UTR diversity, which is the highest at the 
zygote period (0 hpf), with the presence of approximately three 
tandem poly(A) sites per gene and —9% stage-specific 3' UTR iso- 
forms (Fig. 2A). This diversity then rapidly decreases until the end 
of the segmentation period at 24 hpf, which displays —1.5 3' UTR 
isoforms per gene and <1% stage-specific 3' UTR isoform. 

To determine whether the average pattern of 3' UTR dy- 
namics is due to differential usage of alternative 3 ' UTR isoforms, 
we separated genes with two or more tandem poly(A) sites (tan- 
dem-UTR genes) from genes with a single site (single-UTR genes). 
We found that the 3' UTR dynamics of single-UTR genes contrib- 
ute to the lengthening after ZGA, but not the initial shortening 
(Supplemental Fig. S5). By comparison, the 3' UTR dynamics of 
tandem-UTR genes recapture the early 3' UTR shortening before 
ZGA as well as the lengthening afterward (Supplemental Fig. S5). 
To better present the impact of differential usage of tandem 3 ' UTR 
isoforms, we proposed a normalized 3' UTR length (NUL), which is 
defined as the percentage of isoform-weighted 3' UTR length rel- 
ative to the longest 3' UTR length sampled in our data. Thus, by 
averaging the NUL across all genes at a given developmental stage, 
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Figure 2. Tandem 3' UTRs during development. (A) Characteristics of 
tandem 3' UTRs during development. (***) P < 0.001, and (*) P < 0.05, 
two-tailed permutation test, (hpf) Hours post-fertilization. (B) Mean nor- 
malized 3' UTR length (NUL) across developmental stages. (***) P< 0.001 , 
and (*) P < 0.05, two-tailed permutation test. (C) Number of APAsites- 
switching genes between consecutive developmental time points. 



any change in the average NUL between stages should represent 
differential usage of tandem 3' UTR isoforms. In this way, a much 
sharper V-shaped pattern of 3' UTR dynamics was observed dur- 
ing early development (Fig. 2B). In addition, significant 3' UTR- 
lengthening was observed during the pharyngula and hatching 
periods (Fig. 2B). 

To identify genes with their transcripts characterized by sig- 
nificantly shortened or lengthened 3' UTRs between stages, we 
previously developed a statistical method of testing tandem APA 
sites switching, in which multiple poly(A) sites for one gene could 
be considered simultaneously (Fu et al. 2011). By adopting this 
method, we identified a total of 4374 genes with tandem APA sites 
switched (called APAsites-switching genes hereafter) during de- 
velopment (Fig. 2C; Supplemental Table S4; Supplemental Data). 
Notably, 1471 genes shortened and 441 genes lengthened the 
3' UTRs from 0 to 4 hpf, stressing the extreme divergence of 3' UTR 
usage profiles between the maternal state (0 hpf) and the devel- 
opmental stage immediately following ZGA (4 hpf). Consistent 
with high 3' UTR diversity in early development, most (3581; 82%) 
of the APAsites-switching genes displayed significantly altered 



3' UTR lengths within 12 hpf. Based on the APAsites-switching 
genes, we performed hierarchical clustering on the NUL values 
(Supplemental Fig. S6). The most frequent dynamic 3' UTR pattern 
is consistent with a "narrow" V-shape, with the 3' UTR shortened 
from 0 to 4 hpf, lengthened from 4 to 6 hpf, and remaining stable 
afterward (Supplemental Fig. S7). This pattern was further con- 
firmed by quantitative RT-PCR on selected genes (Supplemental 
Fig. S8; Supplemental Notes 1). These results highlight the com- 
plex landscape of tandem 3 ' UTR dynamics during early vertebrate 
development and suggest that the tandem 3' UTR profiles may be 
tightly regulated during early embryogenesis. 

Functional annotation analysis of APAsites-switching genes 

We performed gene ontology (GO) enrichment/depletion analysis 
on APAsites-switching genes using single-UTR genes as the back- 
ground (Fig. 3A; Supplemental Tables S5, S6) (False discovery rate: 
10%, maximum P < 0.011). GO terms related to protein localiza- 
tion, protein binding, the membrane/endomembrane system, 
signal transduction, and regulation of cellular and biological pro- 
cesses are significantly enriched across multiple developmental 
stages. However, these gene products are rarely found as the con- 
stituents of nonmembrane organelles (such as ribosomes) and 
seldom participate in biosynthetic processes. As for cellular local- 
ization, the products of these genes are enriched in intracellular 
processes and are rarely found in the extracellular region. These 
results imply that tandem 3' UTRs are tightly regulated for genes 
actively involved in intracellular processes, but not for genes in- 
volved in basic cell structure or routine biological processes. 

To evaluate the roles of the regulations of tandem 3 ' UTRs in 
signaling cascades, we mapped the APAsites-switching genes to 
KEGG pathways (Fig. 3B; Supplemental Tables S7, S8) (FDR: 20%, 
maximum P < 0.013). Four pathways, the citrate acid cycle (TCA 
cycle), endocytosis, insulin signaling, and tight junction, are sig- 
nificantly enriched in the APAsites-switching genes across early 
developmental stages. For stage-specific enrichment, two func- 
tionally related pathways, focal adhesion and regulation of actin 
cytoskeleton, are enriched in genes with 3 ' UTR lengthened during 
gastrulation (from 6 to 12 hpf). These two pathways are involved 
in the process of morphogenetic cell movement through in- 
teraction between the extracellular matrix and intracellular cy- 
toskeleton during gastrulation (Bearer 1992). Interestingly, en- 
richment of genes with 3 ' UTR lengthened during gastrulation was 
also observed for Wnt and ErbB signaling, two pathways that are 
known to play pivotal roles in regulation of cell motility to achieve 
gastrulation (Nie and Chang 2007; Roszko et al. 2009). These re- 
sults indicate a potential regulatory role of tandem 3' UTR regu- 
lation in cell localization during gastrulation. 

Developmental stage-specific c/5-eIements in tandem 3' UTRs 

One potential mechanism of APA regulation of mRNAs involves 
the gain and loss of ds-elements in the 3' UTRs (Oh et al. 2000; 
Sandberg et al. 2008). In the case of studying genes with exactly 
two tandem poly(A) sites, the common 3' UTR is defined as the 
3' UTR region between the stop codon and the proximal tandem 
poly (A) site, and the extended 3' UTR as the remainder of the 
3' UTR, i.e., the 3' UTR region between the proximal poly(A) site 
and the distal site (see Fig. 1C; Sandberg et al. 2008). Comparisons 
of the common and extended 3' UTRs have demonstrated sub- 
stantially different profiles of base composition (Ji et al. 2009). How- 
ever, such comparisons have not been performed during specific 
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Figure 3. Functional annotations of APAsites-switching genes. (A) Gene ontology (GO) terms significantly (P< 0.01 1 ; 1 0% false discovery rate) enriched 
and depleted in APAsites-switching genes by taking single-UTR genes as the background across developmental stages. GO terms enriched in APAsites- 
switching genes are assigned a positive P-value (hence, positive FDR in the BH sense), and depleted terms are assigned a negative P-value (hence, negative 
FDR). (S) Genes with 3' UTRs shortened; (L) genes with 3' UTRs lengthened. (B) KEGG terms significantly (P< 0.01 3; 20% FDR) enriched and depleted in 
APAsites-switching gene. Data are presented as in A. 



developmental stages or for the same 3' UTR region (either com- 
mon or extended) between 3' UTR-lengthened and shortened genes. 
This study has provided an opportunity to perform such analysis in 
a developmental stage-specific manner. A major challenge, how- 
ever, is the fact that most (3314 of 4374; 76%) APAsites-switching 
genes have three or more poly(A) sites, making it impossible to 
define common and extended 3 ' UTRs for these multipoly(A) genes 
using conventional definitions. 

We proposed a generalized framework to study motif en- 
richment and depletion in genes with any number of tandem 
poly(A) sites. Conceptually, we defined the extended 3' UTR as the 
"altered" portion of the 3' UTR that differs between two develop- 
mental stages (Methods; see details in Supplemental Methods), 
and the common 3' UTR as the shared portion of the 3' UTRs, i.e., 
the 3' UTR corresponding to the most proximal poly(A) site. To test 
motif enrichment and depletion, we considered all 4- to 7-mers 
and compared the distributions of all of these k-mers between two 
sets of 3 ' UTR regions. First, we compared the common 3 ' UTR with 
the extended 3 ' UTR defined by two consecutive developmental 
time points (Supplemental Figs. S9, S10). Consistent with previous 
studies (Beaudoing et al. 2000), the canonical PAS, AAUAAA, is 
always enriched in the extended 3' UTRs compared with the com- 
mon 3' UTRs. Unsurprisingly, many U-rich elements are enriched in 
the extended 3' UTR because its AU-content is higher than that of 
the common 3' UTR (Ji et al. 2009). However, no developmental 
stage-specific motifs were identified, suggesting that the differences 
in base composition between the common and the "altered" regions 
of the 3' UTRs are intrinsic features and are independent of de- 
velopmental timing. 



Next we focused on the extended 3' UTRs and compared the 
3' UTR-lengthened genes with the 3' UTR-shortened genes (Fig. 4A). 
A number of stage-specific motifs were identified, from which three 
families of motifs could be discerned. The first family contains two 
motifs, GCACTT and AGCACTT, the core "seed" region of miR- 
430. The miR-430 seed is first enriched in the extended 3' UTR of 
genes with 3 ' UTRs shortened from 4 to 6 hpf, leading to the loss 
of miR-430 target sites for these genes. This enrichment then 
switches to the extended 3 ' UTRs of genes with 3 ' UTRs lengthened 
from 6 to 12 hpf, resulting in the gain of miR-430 sites. miR-430 is 
expressed as early as 4 hpf and functions first as a maternal mRNA 
cleaner (Giraldez et al. 2006) and then regulates brain morpho- 
genesis (Giraldez et al. 2005). We argue that the tandem 3' UTR 
dynamics from 4 to 12 hpf could be associated with the dual 
functions of miR-430. From 4 to 6 hpf, of the 3' UTR-shortened 
genes with miR-430 seed located in the extended 3' UTRs, the 
expression levels of the 3' UTR isoforms with miR-430 seed were 
mostly down-regulated (Fig. 4B), which is in agreement with the 
role of miR-430 in mRNA degradation during this period (Giraldez 
et al. 2006; Supplemental Notes 2). In contrast, the mRNA levels of 
the 3 ' UTR isoforms without miR-430 seed were largely stable. Some 
(15%) of these isoforms lacking miR-430 seed were even up-regu- 
lated (twofold, FDR < 0.05), suggesting that these isoforms were 
preferentially selected, potentially by APA, to escape miR-430- 
mediated degradation. This situation is reversed for genes with 3' 
UTRs lengthened from 6 to 12 hpf; that is, the isoforms lacking 
miR-430 seeds were mostly down-regulated, while the miR-430 
seed-containing isoforms were largely up-regulated (Fig. 4C). The 
genes that gained a miR-430 seed during gastrulation were enriched 
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Figure 4. Developmental stage-specific motifs in extended 3' UTRs. (A) Enrichment of motifs in extended 3' UTRs by comparing 3' UTR-lengthened 
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were negatively correlated with the number of U-rich motifs (UUUUU) for genes with the 3' UTR shortened from 0 to 4 hpf. Pearson correlation coefficients 
(r) are shown for both Dand f. (f) Expression levels of 3' UTR isoforms were positively correlated with the number of U-rich motifs (UUUUU) for genes with 
the 3' UTR lengthened from 6 to 1 2 hpf. 



in the GO terms "protein binding" and "signal transduction" 
(Supplemental Table S9) ; and included a number of neurogenesis- 
associated genes such as neol (Shen et al. 2002), smo (Philipp et al. 
2008), and pafahlb lb (Supplemental Fig. Sll; Sun et al. 2009). 
More interestingly, the set of genes that lost a miR-430 seed from 4 
to 6 hpf significantly overlapped with the set that gained a seed 
from 6 to 12 hpf (P < 2.2 X 10" 16 , Fisher's exact test; Supplemental 
Fig. SI 2). This resulted in 47 genes with miR-430 seed lost in the 
blastula and immediately regained in the gastrula period (with four 
examples presented in Supplemental Fig. SI 3). The miR-430 tar- 
gets in two of these genes (mdm2 and ubal) were validated by 
luciferase reporter assays (Supplemental Fig. SI 4). These results 
suggest a cooperative mechanism between APA and microRNAs 
in the regulation of development. 

The second family of motifs contains a number of U-rich el- 
ements represented by UUUUA and stretches of U, which resemble 
cytoplasmic polyadenylation elements (CPE) (Mendez and Richter 
2001) or AU-rich elements (ARE) (Barreau et al. 2005). From 0 to 
4 hpf, these U-rich motifs were enriched in the extended 3 ' UTRs of 
genes with the 3 ' UTRs shortened, and a negative correlation be- 
tween the expression levels of 3 ' UTR isoforms and the number of 
these U-rich motifs was observed (Fig. 4D). Interestingly, the in- 
vestigation of single-UTR genes also identified a significant en- 
richment of U-rich motifs in genes that are down-regulated during 



this period (Supplemental Fig. SI 5). Thus, these U-rich motifs are 
associated with decreased mRNA abundance during the cleavage 
and early blastula periods. Similar to the case of miR-430, the en- 
richment of this family of U-rich motifs switched to genes with the 
3' UTRs lengthened from 6 to 12 hpf. However, a positive corre- 
lation between the expression levels of the 3 ' UTR isoforms and the 
number of U-rich motifs was observed during this interval (Fig. 4E), 
suggesting that the 3' UTR isoforms with these U-rich elements 
are likely stabilized. High mRNA levels of the class-Ill ARE-binding 
protein elavll (also known as HuR) were detected (Supplemental 
Fig. SI 6), suggesting that elavll bound to these U-rich motifs and 
increased the stability of these mRNAs. These data imply a complex 
network potentially coordinated by APA, ds-elements in the 
3' UTRs, and RNA-binding proteins throughout the process of 
early development. 

The third family of motifs contained a number of A-rich se- 
quences similar to the canonical PAS, AAUAAA, and were exclu- 
sively enriched in the extended 3' UTRs of genes with 3' UTRs 
lengthened from 6 to 12 hpf. Indeed, the AAUAAA motif was sig- 
nificantly enriched in the extended 3' UTRs of 3' UTR-lengthened 
genes relative to the extended 3' UTRs of 3' UTR-shortened genes 
exclusively in this period (corrected P= 3.3 X 10~ 7 ). Moreover, while 
the canonical PAS was usually enriched in the extended 3' UTR 
relative to the common 3' UTR of genes, regardless of whether 
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their 3' UTRs were shortened or lengthened, we found that this 
enrichment was not significant for genes with 3' UTRs shortened 
in this period (corrected P= 0.086, Supplemental Fig. S9). However, 
this enrichment was extremely strong for 3' UTR-lengthened genes 
(corrected P = 2.9 X10" 138 , Supplemental Fig. S10). These data 
suggest that poly(A) sites with canonical PAS are preferentially se- 
lected during this particular developmental stage (gastrulation). 

Discussion 

In this study we have presented a global picture of tandem 3' UTR 
dynamics during zebrafish development, which is characterized by 
a dramatic V-shaped pattern in early embryogenesis (Fig. 2). Tan- 
dem 3' UTR lengthening is also observed during mouse embryo- 
genesis (Ji et al. 2009), a finding that is consistent with the 3' UTR 
dynamics observed after ZGA (at 4 hpf) in zebrafish development, 
indicating functional conservation of tandem 3 ' UTR regulation in 
the control of vertebrate embryogenesis. The initial 3' UTR short- 
ening could be explained in two ways: (1) maternal mRNA iso- 
forms with long 3' UTRs are cleared; and (2) zygotic mRNA iso- 
forms with short 3' UTRs are selectively transcribed at ZGA. Our 
data suggest that both of these mechanisms make significant con- 
tributions (Supplemental Notes 3), implying that tandem 3' UTR 
dynamics are shaped by both transcriptional (through APA) and post- 
transcriptional (mainly through trans-acting factors) regulations. 

Our investigation of sequence features of common and ex- 
tended 3' UTRs suggests a coordinated network that is potentially 
modulated through APA and trans-acting factors by controlling the 
gain and loss of ds-elements within 3' UTRs. This coordinated 
network is exemplified in three critical phases of early develop- 
ment: the oocyte-to-embryo transition, the maternal-to-zygotic 
transition, and gastrulation (Fig. 5). In the oocyte-to-embryo tran- 
sition, the zygotic genome is transcriptionally repressed and em- 
bryonic development is activated and controlled by maternal factors 
located in the oocyte cytoplasm (Stitzel and Seydoux 2007). Hun- 
dreds (871 in our data) of translation-dormant maternal mRNAs 
with the CPE UUUUAU (a motif significantly enriched in the ex- 
tended 3 ' UTRs of 3 ' UTR-shortened genes from 0 to 4 hpf) (Fig. 4A) 
in their relatively long 3' UTRs are pre-stored in the oocyte 
(de Moor et al. 2005). After fertilization, these dormant mRNAs are 
re-adenylated and become translationally active through a process 
mediated by CPE (de Moor et al. 2005; Richter 2007). Next, during 
the maternal-to-zygotic transition, the zygotic genome is activated 
and maternal mRNA is cleared (Schier 2007). While maternal mRNAs 




CPE for short 3'UTR 3UTRs 
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activation escape ^ or re S u l at i° n 
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Figure 5. Schematic graph of potential biological functions of 3' UTR 
dynamics involved in early embryogenesis. See text for details. 



are likely to be targeted for degradation through ds-elements (such 
as U-rich elements or miR-430 target sites based on the analysis 
of our data) located in their relatively long 3' UTRs (Walser and 
Lipshitz 2011), newly transcribed mRNAs preferentially display 
short 3' UTRs, possibly to escape improper degradation mediated 
by trans-acting targeting. 

When gastrulation begins, dramatic cell movements take place, 
and more precise control must be imposed on mRNA transcription 
and translation through ds-elements in the 3' UTR regulated by 
trans-acting factors (Stefani and Slack 2008). Correspondingly, 
mRNAs with longer 3' UTRs are transcribed in a coordinated 
fashion. Additionally, the lengthening of 3' UTRs during gastru- 
lation may act as the preparation for organogenesis, a delicate 
process that is tightly controlled. A typical example of this situa- 
tion is the development of the nervous system. As miR-430 is 
known to be indispensible in brain morphogenesis (Giraldez et al. 
2005), hundreds (229 in our data) of genes lengthen their 3' UTRs 
to gain miR-430 target sites during gastrulation. Interestingly, the 
coordination between APA and miRNAs has also been addressed 
for some critical genes during Drosophila embryogenesis (Thomsen 
et al. 2010), suggesting a general regulatory mechanism in control 
of animal development cooperated by APA and miRNAs. Genes 
with the 3 ' UTRs lengthened in this period are also enriched in the 
Wnt and ErbB signaling, two pathways that are known to regulate 
neural tube formation (Birchmeier 2009; Ulloa and Marti 2010). 
These genes include major receptors (such as fzd3a) and critical 
effectors (such as shcl and rock2a; Supplemental Fig. SI 7), sug- 
gesting a key role of APA in control of these signaling pathways. 

Taken together, our data are consistent with a two-layer model 
in which (1) APA regulates the usage of the 3' UTR and, thus, the ds- 
elements of mRNAs; (2) mRNAs are or are not under the control of 
trans-factors, depending on the presence or absence of the corre- 
sponding ds-elements. Together, these two factors, APA and trans- 
factors, orchestrate animal development by modulating mRNAs in 
a spatiotemporal fashion. 

Methods 

Collection of zebrafish embryos at different stages 

Tubingen Zebrafish (Danio rerio) was provided by the North Center 
of National Zebrafish Resources of China. Embryos were obtained 
by natural mating and kept in Holt Buffer (NaCl 3.5 g, KC1 0.05 g, 
NaHC0 3 0.025 g, CaCl 2 0.1 g/lL) in 28°C water. Embryos were 
collected at eight time points (0, 4, 6, 12, 24, 48, 72, and 120 hpf), 
and washed twice in phosphate-buffered saline (PBS). A total of 
1 mL of TRIzol was added per 100 embryos, swirled gently, and 
frozen at -80°C. 

SAPAS library construction 

A sequencing library was constructed as described previously (Fu 
et al. 2011). Briefly, total RNA was extracted from zebrafish embryos 
by TRIzol, and —10 fxg of total RNA was randomly fragmented by 
heating. An anchored oligo d(T) primer and a 5' template switch- 
ing linker tagged with Illumina adaptors were used in template 
switch reverse transcription (RT) by Superscript II reverse tran- 
scriptase from Invitrogen. Two mutations in the poly(A) were in- 
troduced by PCR amplification with a determined number of cycles 
to ensure that the ds cDNA remain in the exponential phase of am- 
plification. The PCR products were recovered after PAGE. The size- 
selection of 250-400 bp was performed by PAGE gel-excision. The 
recovery was quantified by a Qubit 2.0 Fluoromete, and the average 
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size was determined by Agilent 2100 bioanalyzer. A quality control 
was performed by plasmid recombinant and Sanger sequencing. 
The recovery was ligated to pGEM-T Easy Vector and transformed 
into DH5a competent cells. Plasmid DNA was extracted and se- 
quenced by ABI 3730 DNA Analyzer. Each end of the insertion 
sequence should be the Illumina sequence primer. The insertion 
sequence with long poly(A) stretch should be <5%, and most of the 
insertion sequence should be mapped to the zebrafish genome. 

3' RACE and qRT-PCR 

Embryos were recollected and the experiments were performed as 
described previously (Fu et al. 2011). Briefly, for 3 'RACE, 30 novel 
poly(A) sites located in various genomic locations and with relative 
high-expression levels were chosen, and the PCR products were 
sequenced by ABO 730. For qRT-PCR, 1 7 genes with extreme 3 ' UTR 
length change at two developmental stages (0-4 and 4-6 hpf) were 
selected (details in Supplemental Notes 1). All of the primers were 
listed in Supplemental Table S2. 

Luciferase reporter assays 

For plasmids construction, 3' UTRs with different lengths were 
cloned by PCR from zebrafish cDNA. To disrupt the miRNA com- 
plementary site, the nucleotides that paired to nucleotides 3 and 5 
of the miRNA were substituted by mutant PCR. All wild-type and 
mutant 3' UTR isoforms were inserted into luciferase reporter 
vector psiCHECK-2 (Promega) between Xhol and NotI, down- 
stream from the luciferase gene. All of the primers were listed in 
Supplemental Table S10. For luciferase reporter assays, HEK293T 
cells were seeded into each well of 6-well plates and incubated 
overnight, cotransfected with 3' UTR psiCHECK-2 luciferase re- 
porter plasmid (100 ng/well) and miRNA (50 nmol/well) by 
Lipofectamine 2000 reagent (Invitrogen). After 24 h, cells were 
lysed and collected for luciferase reporter assay. Each experiment 
was performed at least in triplicate and repeated at least twice in all 
cases. 



Data analysis 

A full description of the bioinformatics methods was given in the 
Supplemental Methods. Raw data were analyzed mainly as de- 
scribed previously (Fu et al. 2011). Briefly, raw reads were trimmed, 
filtered, and mapped to the zebrafish genome (Zv9) using Bowtie 
(version 0.12.5) (Langmead et al. 2009) and the 5' ends of uniquely 
mapped reads were clustered to define poly (A) sites. Total read 
counts from each stage were normalized to eliminate bias intro- 
duced by unequal read counts from different stages, and poly(A) 
sites with five or more normalized reads were used for further anal- 
ysis. Poly(A) sites with reads that overlapped Ensembl-annotated 
3' UTR(s) were defined as tandem poly (A) sites. 

Tandem APA sites switching between stages was detected by 
a test of linear trend alternative to independence (Agresti 2002) as 
described previously (Fu et al. 2011). Normalized 3' UTR length 
(NUL) was defined as the percentage of isoform-weighted 3' UTR 
length relative to the longest 3' UTR length sampled in our data. 
Gene clustering was performed following the standard WGCNA 
procedure (Langfelder and Horvath 2008) for any pair of genes 
with evidence of APA sites switching. 

For testing GO/KEGG item enrichment or depletion, a Fisher's 
exact test was performed to compare the proportion of APAsites- 
switching genes in one specific item with that proportion of single- 
UTR genes, and FDR was obtained in the Benjamini-Hochberg 
sense by using R. For testing motif enrichment or depletion, we 
considered all possible 4- to 7-mers and searched extended 3' UTR 



regions of 3' UTR-lengthened genes against extended 3' UTR re- 
gions of 3' UTR-shortened genes. To count the occurrences of 
a particular k-mer word in the extended 3' UTR region for a gene 
with any number of tandem 3' UTR isoforms, we averaged the 
occurrences of this k-mer word for each 3 ' UTR isoform weighted 
by the isoform expression level change at that stage. P- values were 
obtained by Fisher's exact tests and Bonferroni corrected. 

Data access 

The raw sequence data can be accessed from the NCBI Sequence 
Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra) under ac- 
cession no. SRA036536. 
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